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Abstract 

DNA sequencing is increasingly being used to assist in species identification in order to over- 
come taxonomic impediment. However, few studies attempt to compare the results of these 
molecular studies with a more traditional species delineation approach based on morphological 
characters. Mitochondrial DNA Cytochrome oxidase subunit 1 (COl) gene was sequenced, 
measuring 636 base pairs, from 47 ants of the genus Pheidole (Formicidae: Myrmicinae) col- 
lected in the Brazilian Atlantic Forest to test whether the morphology-based assignment of 
individuals into species is supported by DNA-based species delimitation. Twenty morphospecies 
were identified, whereas the barcoding analysis identified 19 Molecular Operational Taxonomic 
Units (MOTUs). Fifteen out of the 19 DNA-based clusters allocated, using sequence divergence 
thresholds of 2% and 3%, matched with morphospecies. Both thresholds yielded the same num- 
ber of MOTUs. Only one MOTU was successfully identified to species level using the COl 
sequences of Pheidole species already in the Genbank. The average pairwise sequence diver- 
gence for all 47 sequences was 19%, ranging between 0-25%. In some cases, however, 
morphology and molecular based methods differed in their assignment of individuals to mor- 
phospecies or MOTUs. The occurrence of distinct mitochondrial lineages within morphological 
species highlights groups for further detailed genetic and morphological studies, and therefore a 
pluralistic approach using several methods to understand the taxonomy of difficult lineages is ad- 
vocated. 
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Introduction 



Identifying species can be difficult, often re- 
quiring specialized knowledge and thereby 
representing a limiting factor in biodiversity 
inventories (Monaghan et al. 2005). There- 
fore, based on the growing concern over the 
threats to biodiversity, recent publications 
have emphasized the need to accelerate the 
analysis of biodiversity (Brooks et al. 2004; 
Smith et al. 2005) either by using morphospe- 
cies (Hammond 1994; Oliver and Beattie 
1996; Barratt et al. 2003; Krell 2004) or 
DNA-based methods (Floyd et al. 2002; He- 
bert et al. 2003a; Tautz et al. 2003; Blaxter 

2004) . Both morphological and molecular ap- 
proaches have faced criticism (Pires and 
Marinoni 2010) due to the deficiencies en- 
countered when using only a single approach 
for species identification (Knowlton 1993; 
Jarman and Elliot 2000; Rubinoff 2006; 
Rubinoff et al. 2006). The comparison of re- 
sults obtained by various approaches can aid 
in overcoming methodological issues in spe- 
cies identification (Mengual et al. 2006; Smith 
et al. 2008). A further advantage of integrating 
molecular and morphological approaches 
(Dayrat 2005; Cardoso et al. 2009) is that it 
promotes taxonomic stability (Padial et al. 
2010). 

In this paper, a single gene, Cytochrome oxi- 
dase subunit 1 (COl) (as proposed by Hebert 
et al. 2003a, b), was used for barcoding on 
morphologically pre-defined species (mor- 
phosecies) of the hyperdiverse ant genus 
Pheidole (Formicidae: Myrmicinae). The aim 
was to evaluate how DNA barcoding enables 
the definition of Molecular Taxonomic Units 
(MOTUs); Floyd et al. 2002; Blaxter et al. 

2005) , and then link the delineated MOTUs to 
the morphospecies in order to assess congru- 
ence success. This study focused on Pheidole 
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samples from a region in the Brazilian Atlan- 
tic Forest because the region is considered as 
one of the "hottest hotspots" of biodiversity 
(Myers et al. 2000). 

Materials and Methods 

Research area 

The study was carried out in the Rio Cachoei- 
ra Nature Reserve (25° 18' 51" S, 48° 41' 45" 
W) located near the city of Antonina, in the 
coastal region of the Brazilian state of Parana. 
Specimens were obtained between June and 
September 2003 from leaf litter ants sampled 
across 12 study sites, representing four stages 
of secondary forest succession. There were 
three replicates (sites) for each succession 
stage, and the replicate sites of a particular 
succession stage were separated by a mean 
distance of 4 km (range = 1-6 km). At each 
study site, two 50 m transects (parallel, sepa- 
rated by 20 m) were established, and leaf litter 
samples were collected (1 m 2 ) at 5 m intervals 
along these transects (10 sampling points for 
each transect). For more details on sampling 
methodology, see Bihn et al. (2008, 2010). 

The landscape varies from littoral plains with 
isolated hills to the uplands of the Serra do 
Mar mountain range. Lowland and submon- 
tane forests originally covered this area, but 
these dense ombrophilous forests have been 
intensely exploited. Old growth forests remain 
only in the hillside regions. The resulting 
landscape mosaic consists of old growth for- 
ests and secondary forests in various stages of 
succession and pastures (Bihn et al. 2008, 
2010). 

Definition of morphospecies 

Pheidole specimens were identified to species 
with the key for neo-tropical species given by 
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Wilson (2003). In cases where identification 
was not possible with this identification key 
(e.g., when major workers were not collected) 
or led to ambiguous results, ants were sorted 
into morphospecies using characters described 
by Wilson (2003). In addition, morphometric 
measurements were made to aid in the as- 
signment of specimens into morphospecies 
(for details on the set of measurements taken 
and their definition, see Longino 2008). The 
morphometric characters used included: head 
width, distance between eyes, head length, 
anterior head length, scape length, mandible 
length, eye length, eye width, mesosoma 
length, promesonotal groove depth, propodeal 
spine length, femur length, tibia length, pro- 
podeal spiracle width, petiole width, and 
postpetiole width. The morphometric data 
were first standardize to mean = 0 and var = 1 , 
and a hierarchical clustering of the mor- 
phospecies occurring in the Rio Cachoeira 
Nature Reserve was effected using the aver- 
age linkage method (Figure 1). 
DNA extraction, amplification, and se- 
quencing 

Field collections were preserved in 95% etha- 
nol until the time for DNA extraction. 
Specimens already examined and identified to 
be Pheidole morphospecies using morpho- 
logical taxonomy were used for DNA 
extraction. Mitochondrial DNA was isolated 
for at least two minor workers from each 
morphospecies using the Qiagen DNeasy tis- 
sue extraction kit (Qiagen, www.qiagen.com ) 
following the manufacturer's protocols. In 
cases where rare species were involved, DNA 
was extracted from a single individual, in 
which case either two legs or the whole indi- 
vidual was used. 

Polymerase chain reaction (PCR) was con- 
ducted under the following reaction volumes: 
2-4 ul DNA template, 2 ul in 10x PCR 
buffer, 1.6 ul of dNTPs in 10 mM concentra- 
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tion, 1 ul of each primer in 10 mM 
concentration, 0.2 ul of Taq DNA polym- 
erase, and distilled water for a total reaction 
volume of 20 ul. Reactions conditions in- 
cluded: initial denaturation at 95° C for 5 min; 
33 cycles at 95° C (30 sec), 45-52° C at 40-48 
sec (annealing time and temperature depended 
on primer used), and 72° C at 1 min; a final 
elongations at 72° C for 10 min reactions were 
done using an Eppendorf Thermal Cycler. 
Full-length sequences were amplified using 
primer pair LCO1490- 

GGTCAACAAATCAAAAGATATTGG and 
HC02198- 

TAAACTTTCAGGGTGACCAAAAAATCA 
(Folmer et al. 1994). Primer pair LF1- 
ATTCAACCAATCATAAAGATATTGG 
and LR1- 
ATTTGAAGACCTACAGGTTTTTTAGT 
(Herbert et al. 2004a) was also used on speci- 
mens that were difficult to amplify using 
primers HCO/LCO. The two primer pairs 

gave the same length of base pairs. Products 
' \ 

average linkage 




— — i co co 



dist(traits) 
hclust (*, "average") 
Figure I . Hierarchical clustering using average linkage 
method on morphometric characters of the genus Pheidole 
from Rio Cachoeira Nature Reserve in the Brazilian Atlan- 
tic Forest, which defines 20 morphospecies. High quality 
figures are available online. 
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were visualized on a 2% agarose gel, and 
samples containing clean, single bands were 
purified using QIAquick PCR purification kit 
(Qiagen). The purified samples were sent for 
sequencing (AGOWA genomics, 

www.agowa.de ), whereby the primers used in 
each case for amplification served as sequenc- 
ing primers. All samples were sequenced in 
both directions, and the obtained sequences 
aligned using BioEdit version 7.0.9.0 (Hall 
1999). The resultant fragments were approxi- 
mately 658 base pairs (bp), and were 
identified as COl fragments for the ant genus 
Pheidole, with BLAST procedure search in 
GenBank (Altschul et al. 1997) done between 
2008 and 2009. After trimming, the aligned 
sequences were 636 bp long and free from 
gaps. A translation with the invertebrate mito- 
chondrial code returned uninterrupted amino 
acid sequences. These observations support 
the conclusion that the sequences analyzed 
were mitochondrial DNA and not nuclear 
pseudogenes (Bensasson et al. 2001). All se- 
quences were deposited in the GenBank under 
accession numbers JF825012-JF825054 and 
JF914928-JF914931. 

Phylogenetic analysis 

Sequence divergences were calculated using 
the Kimura two parameter distance model 
(Kimura 1980), and the relationships between 
sequences were visualized by a Neighbor- 
Joining tree (Saitou and Nei 1987) using 
MEGA software version 4 (Tamura et al. 
2007). A bootstrap test of phylogeny was ef- 
fected by 100,000 replications and a similar 
random seed. To further infer relationships 
among the supposed morphospecies of Phei- 
dole, phylogenetic analyses were performed 
using MrBayes version 3.1.1 (Huelsenbeck 
and Ronquist 2001), using the default value of 
four Markov chains and the General Time Re- 
versible model. The Markov chain Monte 
Carlo length was 2,000,000 generations, sam- 
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pled every 100 generations (burn-in = 4,500). 
Convergence of the chains was confirmed in 
the two runs by the examination of the aver- 
age standard deviation of split frequencies, 
which in the present study had approached 
0.007. Bayesian posterior probabilities were 
estimated as the proportion of the trees sam- 
pled after the burn-in that contained each of 
the observed bipartitions (Rannala and Yang 
1996; Larget and Simon 1999). The phyloge- 
netic tree was rooted using two species of the 
tribe Pheidolini, Aphaenogaster texana and 
Mess or julianus. 

The MOTU delineation from the COl se- 
quences relied on two aspects. First, 
individuals were considered to be the same 
MOTU if sequences from the same mor- 
phospecies clustered together in the 
phylogenetic tree. A MOTU was thus defined 
as the least inclusive terminal groups (i.e. 
closest to the tips). Second, sequence clusters 
with a mean divergence value less than or 
equal to a threshold of 2% and 3%, as pro- 
posed by Herbert et al. (2004b), were 
considered as MOTUs. In this case, if se- 
quences from two different morphospecies 
formed the same cluster, they only qualified to 
be a single MOTU if their mean sequence di- 
vergence was below or equal to thresholds 2% 
and 3%. 

Match success of the 47 sequences was further 
examined in relation to the COl sequence of 
species in the genus Pheidole already present 
in the COl Genbank library (NCBI, GenBank, 
http://www.ncbi.nlm.nih.gov/ ) (searches done 
between 2008 and 2009). In cases where the 
match success was above 95%, the species 
name for that MOTU was allocated. To estab- 
lish the distribution of genetic divergence and 
positioning of MOTUs in relation to Pheidole 
species from other regions, all COl sequences 
(genus Pheidole) that contained 640 or more 
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bp (sequences retrieved on 2 March 2011) 
were extracted from the Genbank. A total of 
141 sequences were obtained and combined 
with 47 sequences from this study for further 
alignment. The final set of 188 sequences was 
trimmed to 636 bp, and a histogram and a 
Neighbor- Joining tree were constructed (tree 
in Figure 4) using Kimura two parameter dis- 
tance (Kimura 1980). This was implemented 
using data application Package ape (Paradis et 
al. 2004) available in R (R Development Core 
Team 2009). 

Results 

This study produced a final aligned 636 bp 
fragment, characterized with no gaps for all 
the 47 sequences. Sequences were heavily AT 
biased (especially in the third codon position), 
as is expected in insect mitochondrial DNA 
(Crozier and Crozier 1993; Table 1). The av- 
erage pairwise sequence divergence of all 47 
sequences was 19%, ranging from 0-25% 
(Figure 2a). The distribution of Kimura two 



Table I. Sequence statistics for the 47 specimens used in 
the analysis of ant genus Pheidole. 
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parameter distances for 47 sequences showed 
one peak near zero and another between 18 
and 24%, while the 141 COl sequences from 
Genbank had a peak between 16 and 24% of 
sequence divergence (Figure 2b). 
Through morphology-based taxonomy, 20 
morphospecies were identified (Figure 1), 
three of which were allocated species names 
(see Figure 3). DNA analysis identified 19 
MOTUs, 15 of which matched with the mor- 
phospecies — about 79% match success. Forty- 
six sequences showed a matching of below 
95% with COl sequences from the Genbank, 
which ranged from 83 to 87%. Only the se- 
quence for morphospecies JHB14 showed a 
96% match with the Genbank sequence of 
Pheidole laticornis. A phylogeny containing a 
combination of 141 COl sequences of Phei- 
dole species from the Genbank and our 47 
sequences showed distinct clusters for the 
MOTUs (Figure 4, taxa in blue). Specimens 
with only one sequence were regarded as a 
MOTU using the 2% or 3% criterion. In four 
MOTUs, morphological taxonomy did not 
match the results of the DNA-based approach. 

The clustering of the 47 COl sequences in NJ 
and Bayesian trees showed congruence with 
most morphospecies groups, with most nodes 
immediately below (i.e., defining) clusters 
showing a bootstrap support and a posterior 
probability of 100 (Figure 3). Divergences 
between sequences making up different COl 
clusters (MOTUs) were far higher than diver- 
gences within a cluster of MOTUs (11 -fold 
higher), with average Kimura two parameter 
divergences within and between clusters being 
1.8% and 20% respectively (Figure 2). Excep- 
tions occurred where deep sequence 
divergences were apparent between individu- 
als identified as the same morphospecies (the 
two red bars in Figure 3, JHB03285G01 and 
JHB01425G01). These have a mean sequence 
divergence of 12.6%, a divergence 6-4 fold 
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Figure 2. Distribution of pairwise distances of ant genus 
Pheidole calculated using Kimura two-parameter model 
(Kimura 1 980) among (a) 47 Cytochrome oxidase I (COl) 
sequences from Rio Cachoeira Nature Reserve in Brazil 
and (b) I4I COl sequences from the Genbank. The 47 
sequences have a peak near zero and another between 1 8% 
and 24%, while the I4I sequences have a major peak be- 
tween 1 6% and 24% of sequence divergence. High quality 
figures are available online. 
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higher than the 2% and 3% thresholds respec- 
tively, which were used to allocate individuals 
into their respective clusters. 

Discussion 

The results indicated that COl sequences 
showed promising success in allocating mor- 
phologically pre-defined individuals into 
distinct Pheidole MOTUs. Most sequences 
clustered into cohesive, well-differentiated 
groups, most of which showed congruence 
with the predefined morphospecies. The ma- 
jority of nodes immediately defining the 
clusters showed remarkably high levels of 
nodal support. Furthermore, most clusters re- 
mained distinct as sample sizes increased 
during the progress of the work, an indication 
that such groups included distinct COl linea- 
ges rather than scattered sequence variation 
(Hajibabaei et al. 2006). In addition, sequence 
divergence in COl mitochondrial DNA within 
MOTU (clusters) was usually much lower 
than 2%, whereas divergence between the 
clusters was often greater, but remained 
within the range of divergences between COl 
sequences of Pheidole species from the Gen- 
bank. This result is in general agreement with 
empirical levels of divergence found between 
species in barcoding studies (Hebert et al. 
2003b). These aspects strengthen the fact that 
most of the identified morphospecies were 
indeed distinct lineages (Wiens and Penkrot 
2002). 

A total of 20 morphospecies were recovered 
using morphological characters, whereas 19 
MOTUs were recovered using barcodes. The 
diversity estimates (MOTU) using threshold 
values of 2% and 3% were similar to the di- 
versity estimate based on morphological 
characters. In cases where the 2-3% MOTU 
and the morphological estimation of a species 
differed, either different morphospecies clus- 
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tered to form the same MOTU (e.g., as was 
the case with the two MOTUs represented by 
the dark blue bars in Figure 3) or sequences 
from the same morphospecies showed deep 
sequence divergence (e.g., MOTUs repre- 
sented by two red bars in Figure 3) and were 
thus allocated as different MOTUs. 

Morphological reexamination of the two MO- 
TUs that shared morphospecies (i.e., JHB13 
and JHB17, JHB12 and JHB21; MOTUs indi- 
cated with dark blue bars in Figure 3) revealed 
significant differences in morphometric char- 
acters between the shared species, as 
evidenced by the hierarchical clustering (Fig- 
ure 1). The grouping into one MOTU may be 
due to incomplete lineage sorting or even mi- 
tochondrial introgression (Herbert and 
Gregory 2005; Meyer and Paulay 2005). In- 
complete lineage sorting or gene introgression 
could be possible in taxa with shared se- 
quences/haplotypes because the species 
occurred in the same locality. This explana- 
tion is further reflected by their low mean 
sequence divergence (below 0.02) and high 
bootstrap support for their respective clusters. 
On the contrary the two MOTUs representing 
possible cryptic taxa (two red bars; JHB02 in 
Figure 3) had high mean sequence divergence 
and their clustering was not strongly sup- 
ported, thereby qualifying them to possibly be 
different species. Further reevaluation of the 
four MOTUs either for introgression or cryp- 
tic diversity using mitochondrial DNA was, 
however, hampered by the limited samples. 

The results further revealed very low success 
when matching MOTUs with Pheidole species 
already in a COl library. After integrating the 
47 sequences with those from the Genbank, 
clusters of the MOTUs remained stable within 
the phylogeny, with only four MOTUs form- 
ing monophyletic clusters with the species 
from the Genbank (Figure 4). The species 
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name for MOTU JHB05239G01 was allocated 
as Pheidole laticornis based on the set criteria 
of allocating species names in this study and 
others (Meier et al. 2006). Overall, it was dif- 
ficult to allocate species names to MOTUs 
based on COl sequences in the library. This 
observation does not imply that COl cannot 
be used in species identification as barcodes, 
but for the MOTUs in this study, other strate- 
gies will be necessary. The low success in 
matching these MOTUs with Genbank se- 
quences was likely because only a few of the 
more than 600 described species of Pheidole 
are included in Genbank. Also, there are many 
undescribed ant species in the neo-tropical 
region, and there are suggestions that many 
species in the Mata Atlantica may be endemic 
(R. Brandl, personal communication). 

Congruence success between the two species 
identification approaches was not very high, 
which may be attributed to the criteria applied 
in delimiting MOTUs. For instance, threshold 
approach is vulnerable to both false positives 
and false negatives (Meyer and Paulay 2005). 
Regardless of such shortcomings in both mor- 
phological taxonomy and DNA barcoding 
(DeSalle et al. 2005; Pires and Marinoni 
2010), the low congruence success does not 
compromise their effective use for species 
identification (Smith et al. 2005); on the con- 
trary, either approach helps to illuminate 
taxonomic assignments in need of further 
scrutiny (Herbert and Gregory 2005; Padial et 
al. 2010). Such scenarios call for a more thor- 
ough morphological and COl diversity survey 
among the members of the involved taxa. 
Moreover, in cases of introgression, the analy- 
sis of a rapidly evolving nuclear sequence, 
such as the internal transcribed spacer region 
of the ribosomal repeat, will aid taxonomic 
resolution (Herbert et al. 2003a). However, 
this study did not manage to employ other 
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molecular markers for species delimitation, 
and was only limited to mitochondrial DNA. 

Five rare morphospecies were represented 
with only a single sequence (MOTUs repre- 
sented by the green bars in Figure 3) due to 
the used sampling techniques and were coded 
by the 2% and 3% criterion as MOTUs. A 
previous study on DNA barcoding of ants us- 
ing these thresholds (Smith et al. 2005) 
recommended that only by sampling multiple 
individuals from supposed species, or MO- 
TUs, will inter-specific variation be properly 
assessed. Otherwise, it is impossible to test the 
hypothesis of species-level monophyly (Funk 
and Omland 2003) and could lead to biodiver- 
sity overestimation. This is a valid concern in 
an analysis of MOTUs from inventories of 
hyperdiverse groups such as ants, which often 
include many taxa known only from single 
individuals (Fisher 1999; Longino et al. 
2002). Morphological identification of such 
rare species also calls for the use of multiple 
individuals in order to assess the conformity 
in taxonomic characters within individuals of 
a given taxa. With additional inventories in 
the future, many of these rare species will be 
represented in collections by more specimens. 

The aim of this study was to investigate the 
efficacy of DNA barcoding in delimiting pre- 
defined species. The results provide an exam- 
ple of the complementarity with which DNA 
barcoding can be applied together with a more 
conventional morphological approach, with- 
out competing or replacing the latter approach 
(Hebert and Gregory 2005). Moreover, 
thresholds of 2% and 3% proved to be effec- 
tive in delineating species in the genus 
Pheidole. Despite the shortcoming in match 
success, this study demonstrated that diversity 
estimates using COl MOTU together with 
morphological taxonomy offer a means to 
map the occurrence of ant species that still 
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wait to be formally described and included 
into keys for identification. 
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Figure 3. Linearized Bayesian tree of the ant genus Pheidole from Rio Cachoeira Nature Reserve in the Brazilian Atlantic 
Forest, which defines 19 MOTUs. The clustering of individual sequences in the tree indicates the membership of each 
MOTU. MOTUs were inferred from a tree dependent clustering process, coupled with thresholds of 2%. Each colored bar 
represent a MOTU. The five green bars represent cases where only one individual was sequenced, two red bars indicate 
possible cryptic taxa, and the other two dark blue bars indicate MOTUs with shared taxa. The three names in front of the 
bars represent Pheidole species whose names were assigned based on morphological taxonomy, and the numbers preceded 
by JHB represent the different morphospecies. Posterior probability values for Bayesian tree and bootstrap support values 
for Neighbor-joining tree (in bold) above 50% are indicated on the nodes. A dash (-) indicates bootstrap values below 50%. 
High quality figures are available online. 
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Figure 4. A combined phylogeny of CO I sequences all from the genus Pheidole from the Genbank, and 47 sequences (taxa in 
blue) from Rio Cachoeira Nature Reserve. The 47 sequences formed distinct clusters in relation to those from the Genbank. 
The tree is rooted using the taxa in red. High quality figures are available online. 
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